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1. Summary 

Chromatin-binding proteins must navigate the complex nuclear milieu to find 
their sites of action, and a constellation of protein factors and other properties 
are likely to influence targeting specificity. Despite considerable progress, the 
precise rules by which binding specificity is achieved have remained elusive. 
Here, we consider early targeting events for two groups of chromatin-binding 
complexes in Drosophila: the Male-Specific Lethal (MSL) and the Polycomb 
group (PcG) complexes. These two serve as models for understanding target- 
ing, because they have been extensively studied and play vital roles in 
Drosophila, and their targets have been documented at high resolution. Further- 
more, the proteins and biochemical properties of both complexes are largely 
conserved in multicellular organisms, including humans. While the MSL com- 
plex increases gene expression and PcG members repress genes, the two groups 
share many similarities such as the ability to modify their chromatin envi- 
ronment to create active or repressive domains, respectively. With legacies of 
in-depth genetic, biochemical and now genomic approaches, the MSL and 
PcG complexes will continue to provide tractable systems for understanding 
the recruitment of multiprotein chromatin complexes to their target loci. 



2. Introduction 

The primary unit of chromatin is the nucleosome, consisting of approximately 
147 base pairs of DNA wrapped around a histone octamer. Genetic information 
is encoded at the DNA level, but most interpretation of genetic information 
occurs at the protein level. In addition to histones, numerous non-histone pro- 
teins interact with chromatin and DNA to bring about the proper, cell-specific 
interpretation of the genome. 

One major question in chromatin biology is how protein players find their 
proper sites of action. In genomes ranging in sizes from megabases to gigabases 
of DNA, transcription factors, which recognize their sites in a DNA sequence- 
specific manner, are only found localized to a fraction of their numerous 
consensus sequences throughout the genome. It is no wonder, then, that com- 
plete understanding of the potentially more complicated phenomenon of 
chromatin targeting has remained elusive (figure 1). Here, we focus on two 
groups of chromatin-bound factors in Drosophila: the male-specific lethal 
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Figure 1. Potential factors influencing the selection of chromatin-binding sites. 
Numerous characteristics may influence the selection of binding sites in the 
genome. DNA primary sequence and sequence composition, the local chromatin 
environment and long-range chromatin conformation, protein -protein inter- 
actions and non-coding RNA guidance may act synergistically for proper 
targeting of chromatin complexes to their sites of action. 



(MSL) complex and the Polycomb group (PcG). These two 
groups can be considered as models for addressing how chro- 
matin factors are targeted, as their sites of action have been 
documented at high resolution. 

The MSL complex regulates dosage compensation in 
Drosophila. Dosage compensation is the means by which X 
chromosome gene expression is adjusted to balance gene 
expression from the autosomes. In flies, this is achieved by 
hypertranscription of the active genes on the single male X 
chromosome. The MSL complex is made up of five core proteins 
(figure 2a and table 1) and two redundant non-coding RNAs 
(roXl and roX2). The loss of any protein component or both 
RNAs leads to male lethality. Downstream of the sex determi- 
nation cascade, the MSL complex is assembled in males only 
and targeted exclusively to active genes on the X chromosome. 
Recruitment to the male X appears to occur in at least two steps 
(figure 2h). The first step involves initial targeting to several 
hundred chromatin entry sites (CESs; also called high-affinity 
sites) carrying a degenerate sequence motif. Two prominent 
CESs are the roX RNA genes that produce the ncRNA com- 
ponents of the complex. The second targeting step involves 
sequence-independent spreading in cis to most active genes. It 
is for this second step that roX RNAs seem most critical 
(reviewed in reference [1]). Evidence for this 'nucleate and 
spread' model comes from experiments in which the MSL com- 
plex becomes targeted to active genes flanking the ectopic 
insertion of roX RNA transgenes on autosomes [51,52]. There- 
fore, the initial targeting of the complex to X chromosome 
entry sites is critical for its specificity. However, understanding 
the mechanism for the selection of the initial CES has been chal- 
lenging, because the associated sequence motif is enriched less 
than twofold on the X chromosome versus the autosomes. 

The PcG functions as a set of repressors that maintain 
the transcriptional inactivation of developmentally silenced 
genes. Originally identified as critical for the maintenance 
of the parasegment-specific pattern of Hox gene expression, 
subsequent analysis has identified hundreds of PcG targets 
beyond the Hox gene clusters. The PcG is made up of 
approximately 20 proteins (figure 3a and table 1), which 
form several multiprotein complexes that possess slightly 
different genomic binding patterns and have differential bio- 
chemical activities. At individual target genes, Polycomb 
Response Elements (PREs) have been identified that can 



function in ectopic chromatin contexts, but these lack a 
strong consensus motif. A classical model for the targeting 
of PcG complexes is that they recognize PREs in silenced 
domains that were previously established by repressive, 
spatially restricted transcription factors. Once PcG complexes 
are initially targeted to PREs, they can be stably maintained 
at these loci even after the original silencing factors are no 
longer expressed. Like the MSL complex, the PcG may also 
have a spreading mechanism, as silenced regions can form 
large PcG-associated domains. The creation of these domains, 
which can differ from cell type to cell type, is not understood. 

In this review, we focus on the initial steps of recruitment of 
these complexes, which are likely to be mechanistically separ- 
able from later maintenance phases. For the MSL complex, the 
CESs comprise the set of sites that are initially targeted. For 
the PcG, PREs are generally considered 'initial targets'. In the 
context of this review, PREs guide the PcG to lineage-specific 
target sites during embryonic development. Later phases of 
PcG-association probably form a self-perpetuating chromatin 
state, where complex retention at target sites is stable through 
the cell cycle, possibly by a modified 'nucleate and spread' 
mechanism from selectively retained sites [55-57]. 



3. Biochemical toolbox: DNA/chromatin 
recognition properties of the core 
complexes 

In trying to understand the targeting of protein complexes 
to their sites of action throughout the genome, there are sev- 
eral factors to consider (figure 1). As a starting point, we 
catalogue the proteins of our model groups, especially 
taking note of their domain architecture relevant to DNA/ 
chromatin interaction. 

The MSL complex is targeted to the male X chromosome 
with virtually complete fidelity. It remains a mystery how 
this is accomplished with our current understanding of the 
members of the complex and their functions. An examination 
of the known domains present in the MSL proteins reveals 
that none is predicted to be a sequence-specific DNA-binding 
protein. Several, however, carry domains well characterized 
to interact with chromatin. Additionally, each member may 
confer unique biochemical activities to the whole. 

There are two main enzymatic properties known to be 
required for MSL function. The first, RNA helicase activity, 
is conferred by Maleless (MLE) [23], which recent work 
suggests may be critical for roX RNA remodelling and com- 
plex assembly [58]. The second, histone acetyltransf erase 
activity directed towards H4K16, is catalysed by Males 
absent on the first (MOF) [21,59], which is likely to be key to 
the increase in transcriptional activity of male X-linked 
genes. While these activities are both essential for MSL func- 
tion, they appear to be dispensable for the initial targeting of 
the complex. More recently, the groups of Dou and Becker 
showed that MSL2 has E3 ubiquitin ligase activity in vitro. 
Wu et al. provided evidence that mammalian and fly MSL2 
had activity in association with MSLl, leading to ubiquitina- 
tion of H2B K34 (H2B K31 in flies) [60]. By contrast. Villa 
et al. found that Drosophila MSL2 ubiquitinates other MSL com- 
ponents, probably serving a stoichiometry-balancing role [61]. 
Whether either of these functions is essential for dosage 
compensation has not yet been reported. 
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Figure 2. The MSL spreading model, the MRE and CLAMP, {a) The Male-Specific Lethal dosage compensation complex consists of five core protein members and one of 
two non-coding RNAs. Male-specific lethal (MSL) 1, 2 and 3, Males absent on the first (MOF) and Maleless (MLE) are thought to initially assemble cotranscriptionally 
with one of the two roX (RNA on theX) RNAs. The MOF subunit acetylates histone 4 at lysine 16 (H4K16Ac). Adapted from Gelbart & Kuroda [1]. (b) The MSL complex is 
proposed to spread to most active genes on the male X chromosome in a stepwise model. First, the complex is assembled cotranscriptionally at the roX loci (red boxes), 
and then to approximately 250 chromatin entry sites (CESs) along the X in a sequence-dependent manner (peach boxes). Finally, the complex spreads from the CESs to 
most active genes in a sequence-independent manner (black boxes). Adapted from Gelbart & Kuroda [1]. (c) Analysis of the MSL recognition elements (MREs) within 
CESs revealed a degenerate GA-rich sequence that is required for MSL binding when tested in a transgenic context. Additionally, the regions immediately surrounding the 
CESs have low GC-content generally, whereas the 10 kb flanking regions are generally GC rich. From Alekseyenko et al. [2,3], reproduced with permission, (d) The direct 
DNA-binding sequence motif identified for CLAMP is very similar to the observed MRE sequence. From Soruco et al. [4]. 



Non-enzymatic domains that have known chromatin inter- 
action capabilities are also present. MSL3 and MOF both 
contain chromodomains, which are found in various chroma- 
tin-modifying proteins and often interact with methylated 
hist ones. The MSL3 chromodomain is characterized to have 
H3K36me3 binding [52], H4K20mel binding [18] and 
H4K20me2 binding [62]. The MSL3 chromodomain plays a 
role in the spread of MSL to all active genes, but, akin to the 
enzymatic activities above, is dispensable for initial targeting 
[63]. Unsurprisingly, MLE contains several RNA-interacting 
motifs, including a double-stranded RNA-binding domain, a 
DExH helicase domain and a C terminal glycine-rich region, 
which could be used for engaging chromatin via RNA [24,64]. 

Fauth et al examined the DNA-binding capacity of the 
MSLl and MSL2 proteins in vitro, using electrophoretic mobility 
shift assays with recombinant proteins, to demonstrate that the 
CXC domain in MSL2 exhibited non-specific affinity for DNA 



[15]. Recently, Straub and co-workers manipulated biochemical 
conditions prior to immunoprecipitation, in a novel approach to 
probe for high affinity protein -DNA interactions that might be 
captured within the nucleus. They used low-percentage formal- 
dehyde to reduce the number of chemical cross-linking events 
and high-energy chromatin shearing to disrupt indirect 
protein -DNA interactions, ideally preserving only the most 
DNA-proximal interactions [65]. Using this method, MSL pro- 
teins are partially degraded, and surviving epitopes for MLE 
and MSL2 colocalize at CESs, suggesting that they may directly 
contact the initial targeting sites. This result is in contrast to pre- 
vious genetic models, in which MSLl and MSL2 function 
together at CESs [1]. In the high-energy-shearing experiments, 
MSL3 maintains its normal, broad localization. Perhaps because 
of its small size, MSL3 can survive the high-energy shearing the 
best, thus leaving its full pattern apparent. Cognate experiments 
using the ChlP-exo [66] or MNase ChIP techniques [67,68], 



Table 1. Genetically defined members of the MSL complex and the PcG group. The genetically identified members of the MSL complex and the PcG group are 
listed as found in purified complexes [5-12]. Known RNA/DNA/chromatin interaction domains are listed as well as additional domains not typically observed to 
have this function. Structural studies have informed the understanding of molecular mechanisms. Relevant Drosophila structural data are provided with protein 
data bank (PDB) identifiers. EscI, Extra sex combs-like; Pel, Polycomb-like; Sxc (Ogt), Super sex combs (0-glycosyltransferase); Crm, Cramped. 



protein 
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domains associated with 

RNA/DNA/chromatin 

interaction 


other domains 


relevant Drosophila 
structural data — 
protein data bank 
(PDB) IDs 


domain and 

structural 

references 


MSL1 


MSL [12] 




coiled-coil, PEHE 




[13,14] 


MSL2 


MSL 


CXC 


RING 


CXC— 2LUA 


[14-17] 


MSL3 


MSL 


chromodomain 


MRG 


chromodomain — 3M9Q 


[18-20] 


MOF 


MSL 


chromodomain, Zn finger 
(C2HC), HAT 




chromodomain — 2BUD 


[21,22] 


MLE 


MSL 


RBI, RB2, ATPase/helicase, 
Gly rich 






[23,24] 


Pc 


PRC1 [5] 


chromodomain 




chromodomain — 1 PPB, 
1PDQ 


[25-27] 


Ph 


PRC1 


Zn finger (PCS) 


SAM 


SAM— 1PK1, 1KW4 


[28,29] 


Psc 


PRCI/dRAF [9] 




RING 




[30] 


Su(z)2 






RING 




[30] 


dRing 


PRCI/dRAF 




RING 




[31-33] 


Scm 


PRC1 


Zn finger (PCS) 


MBT, SAM 


SAM— 1PK1, 1PK3; 
MBT— 2R57, 2R58, 
2R5A, 2R5M 


[28,34,35] 


E(z) 


PRC2 [6-8] 


SANT, SET 


CXC 




[36,37] 


Esc 


PRC2 




WD40 




[38] 


EscI 






WD40 




[39] 


Su(z)12 


PRC2 


Zn finger (C2H2) 


VEPS-box 


Su(z)12/Nurf55 
interaction — 2YB8 


[40,41] 


Pel 


PRC2 




Zn fingers 
(PHD), Tudor 


Tudor— 2XK0 


[42,43] 


Pho 


PhoRC [11] 


Zn finger (C2H2) 




Pho/Sfmbt interaction — 
4C5E, 4C5G, 4C5H 


[44,45] 


Phol 




Zn finger (C2H2) 






[46] 


dSfmbt 


PhoRC 


Zn finger (PCS) 


MBT, SAM 


MBT— 3H6Z; Pho/Sfmbt 
interaction — 4C5E, 
4C5G, 4C5H 


[44,47] 


Asx 


PR-DUB [10] 




Zn finger (PHD) 




[48] 


Calypso 


PR-DUB 




Peptidase CI 2 




[10] 


Sxc (Ogt) 






TPR 




[49] 


Crm 




SANT 






[50] 



which allow greater resolution without the potential loss of 
protein integrity from high-energy chromatin shearing, would 
be of great value to confirm the proposed high-resolution 
mapping of MSL2 and MLE specifically to CESs. 

Similarly, the PcG contains many proteins that have chroma- 
tin-interacting domains, but few that are capable of binding DNA 
in a sequence-specific manner. A substantial number of the 
known PcG proteins have been identified as subunits of two 



main PcG complexes, Polycomb Repressive Complex 1 and 2 
(PRCl and PRC2). PRCl complex is thought to act as a direct 
executor of target gene silencing through inhibition of chromatin 
remodelling and chromatin compaction [5]. Polycomb (Pc), Poly- 
homeotic (Ph), Posterior sex combs (Psc) and Sex comb extra (See, 
aka dRing) are the core subunits of PRCl [31]. Suppressor of zeste 
2 (Su(z)2), which is functionally redundant with Psc [69], and Sex 
comb on midleg (Scm) copurify with PRCl at substoichiometric 
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Figure 3. PcG complexes, PRE architecture and pairing-sensitive silencing (PSS). (a) Multiple polycomb group complexes have been characterized: Polycomb Repres- 
sive Complex 1 and 2 (PRC1, PRC2), Pho-repressive complex (PhoRC), dRING-associated factors (dRAF) and Polycomb repressive deubiquitinase (PR-DUB). E(z) 
catalyses trimethylation of histone 3 on lysine 27 (H3K27me3). The chromodomain of Pc is known to recognize this mark. Pho (and the related Phol) are the 
only PcG proteins to have characterized sequence-specific binding. dRING (also called See, Sex combs extra) catalyses H2A ubiquitination in the context of 
the dRAF complex, but not in the context of PRCL The PR-DUB complex removes this H2A ubiquitination. The reason for ubiquitination cycling on H2A is not 
fully understood. Adapted from Schwartz & Pirrotta [53]. (b) The engrailed PREs and the iab-7/Fab-7 PRE are schematized with identified consensus sequences 
for various DNA-binding factors. Despite the large number of potential interactors, no single motif is sufficient to predict PREs. Adapted from Brown & Kassis [54]. 
(c) PSS is a phenomenon in which expression of a homozygous transgene that contains a PRE is less than in the heterozygous case, suggesting that homologue 
pairing enhances silencing mediated by the PRE. In comparison, homozygosity for transgenes lacking PREs generally leads to higher expression levels. The gradation 
of expression is schematized for a mini-white transgene in a white'""' genetic background. The white gene is responsible for red eye colour in flies. 



levels and are also categorized as PRCl subimits. RNA interfer- 
ence (RNAi) knockdow^n experiments in tissue culture cells 
have suggested that Scm may be particularly important for 
targeting of PcG complexes at PREs [70]. 

The PRC2 complex is responsible for trimethylation of 
histone H3 lysine 27 (H3K27me3) [6-8,71], a mark of transcrip- 
tional repression w^hich is know^n to be recognized by the 
chromodomain of Pc. PRC2 consists of Enhancer of zeste 



(E(z)) as its catalytic subunit, and Suppressor of zeste 12 
(Su(z)12), Extra sex combs (Esc) and Nurf 55 (aka Cafl) as 
non-catalytic subunits. Recent w^ork suggests that the zinc- 
finger domain found in Su(z)12 may play a role in association 
w^ith chromatin, as its deletion led to loss of viability in trans- 
genic lines and loss of localization to PREs in cell culture [72]. 
It is yet unknow^n w^hether this is a direct interaction, or 
w^hether it is mediated by another accessory factor or factors. 



In addition to PRCl and PRC2, other PcG complexes have 
been identified in Drosophila. dRING-associated factors (dRAF) 
complex, which shares Psc and See/ dRing with PRCl, contains 
the demethylase dKDM2, and is involved in H3K36me2 
demethylation and H2A ubiquity lation [9]. Polycomb repressive 
deubiquitinase (PR-DUB), another PcG complex, consists of 
Additional sex combs (Asx) and the ubiquitin carboxy-terminal 
hydrolase Calypso, which specifically removes monoubiquitin 
fromhistone H2A [10]. The mutant phenotypes of these proteins 
demonstrate that the ubiquitination/ de-ubiquitination cycle of 
H2A is important for PcG repression. 

The Pho-repressive complex (PhoRC), consisting of the 
DNA-binding proteins Pleiohomeotic (Pho) or Pleiohometic- 
like (Phol) together with Scm-related gene containing four 
mbt domains (Sfmbt), is the only PcG complex shown to 
have sequence-specific DNA-binding activity [11,44]. Evi- 
dence for an interaction between PRC2 components and the 
DNA-binding proteins Pho and Phol, as well as the require- 
ment of Pho in PRE binding of E(z), led to a model of 
hierarchical recruitment of PcG complexes [73]. In this 
model, Pho and Phol bind to PREs and recruit PRC2 complex 
to PREs through their interaction. Subsequently, E(z) methyl- 
ates H3K27, which results in the recruitment of PRCl by the 
recognition of the histone mark by the Pc chromodomain. 
However, this simple model is not sufficient to explain PcG 
silencing. PRCl and PRC2 components are still visible by 
immunostaining at many sites on polytene chromosomes in 
pho and pho-Uke double mutants [46], suggesting that 
additional DNA-binding factors are likely to be involved in 
PcG recruitment and silencing. In addition, competing studies 
disagree about the extent of colocalization between Pho and 
PRCl. One puts the percentage at 96% colocalization between 
Pho and PRCl (as mapped by the intersection of Pc and Ph), 
whereas the other reports only 50% between Pho and PRCl 
(as mapped by Pc alone) [74,75]. One aspect that may con- 
found comparisons of this nature is that there may be 
inherent differences between the chromatin from different 
sources (i.e. tissue culture versus embryo versus different tis- 
sues). The antibodies used also play a role, as two separate 
validated antibodies to the same PcG subunit produce differ- 
ent patterns [76]. Another consideration is that there are 
many peaks observed for the individual PcG factors that are 
not shared by other members of the PcG. In addition, the defi- 
nition of PREs for Chip studies is based on co-enrichment 
peaks, most of which have not been functionally validated as 
PREs. Furthermore, numerous transcription factors have bind- 
ing sites within each functionally validated PRE (see below). 

In reviewing the characterized biochemical activities avail- 
able to the MSL and PcG complexes, it is clear that both groups 
have proteins with the ability to interact with DNA or chroma- 
tin. Both groups are capable of catalysing histone modifications 
known to be associated with chromatin state, and both have 
ubiquitin ligase activity. The general DNA/ chromatin affinity 
observed for these complexes makes logical sense, because 
this would allow maintenance of chromatin states after their 
initial establishment. However, for MSL and PcG, many details 
regarding their initial targeting are stiU lacking. 

4. Sequence motifs and binding proteins 

In the spreading model, the MSL complex initially binds a set 
of CESs containing a degenerate sequence motif (figure 2c). 



Reduction of the MSL complex by genetic [2] or RNAi means 
[77] revealed 150-300 CESs, as defined by perdurance of the 
ChlP-enriched peaks. The majority contain a (GA)4-core 
sequence, with flanking GA enrichment encompassing a 
21-29 bp motif, termed the MSL-recognition element (MRE). 
CESs moved to autosomes as 150 bp transgenic segments 
attract the MSL complex to autosomes, whereas mutants in 
which the consensus MRE is disrupted fail to attract the com- 
plex. However, the MRE is enriched only on the X chromosome 
by 1.5- to 2-fold (depending on stringency of motif search par- 
ameters); yet from this modest enrichment, there is nearly 
perfect fidelity for the X chromosome. Philip et al. [78] looked 
into this conundrum further, analysing the sequence compo- 
sition biases of the X chromosome versus the autosomes, and 
MSL-bound versus MSL-unbound genes on the X. Analysis 
was carried out by parsing the chromosomes and analysing 
the frequencies of all 2-6-mer 'words'. They determined that 
X chromosome genes are characteristically GC rich, not just 
in D. melanogaster , but also in the greater Drosophila genus. 
Thus, an interesting possibility is that primary sequence com- 
position surrounding the MRE motif may play a role in site 
selection. Credence to this idea is lent by subsequent work 
which detected a characteristic GC enrichment signature in 
flanking sequences around MREs [3]. However, additional 
unknown specificity factors must still be invoked to explain 
the strong fidelity of the MSL complex for the X chromosome. 

To identify additional proteins involved in targeting of the 
MSL complex, including general factors that might carry 
essential functions in both sexes, Larschan et al. [79] used an 
RNAi screen that culminated in the identification of a novel 
zinc-finger protein, CGI 832, also linked biochemically to the 
MSL complex by ChlP-mass spectrometry [80]. In a major 
step forward, CGI 832, renamed chromatin-linked adaptor 
for MSL proteins (CLAMP), was discovered to have direct 
sequence-specific DNA-binding affinity for the MRE motif 
in vivo and in vitro (figure Id) [4]. However, CLAMP is 
bound to MRE sequences throughout the genome and in 
both sexes, so male X specificity is still only partially explained, 
as follows. MSL complex and CLAMP mutually reinforce each 
other's interaction at CESs. Furthermore, CLAMP is found at 
more MREs on the X chromosome than on the autosomes in 
female cells, suggesting that CLAMP has higher affinity for 
the X even in the absence of the MSL complex. Based on asses- 
sing relative occupancy levels of CLAMP at sites on the X, 
Soruco and co-workers [4] suggest a model in which binding 
at the roX2 locus and adjacent chromatin acts as a 'beacon', 
along with the roX RNAs themselves, for MSL recruitment 
and synergistic spread to MREs along the X. Thus, the charac- 
terization of CLAMP provides a direct link between the MSL 
proteins and MRE sequence-specific recognition, and, along 
with roX RNAs, provides at least a partial explanation for 
initial X recruitment. 

In contrast to the MSL complex, the structural and func- 
tional analysis of PREs, which were originally characterized 
in the Bithorax cluster, has shown that there is no strong con- 
sensus sequence that can be simply defined as a PRE motif. 
Instead, PREs often contain diverse combinations of 
sequence motifs for multiple DNA-binding proteins, includ- 
ing Pho /Phol, SPl/KLF proteins, GAGA factor (GAP)/ 
Pipsqueak (Psq), Dorsal switch protein 1 (Dspl), Grainyhead 
(Grh) and Zeste [81], reflecting a constellation of transcrip- 
tion factors that may establish initial target gene expression 
levels (figure 3h). 



The core consensus motif of Pho is GCCAT, and longer 
versions of the Pho consensus sequence have been detected 
by genome-wide analyses [74,75,82]. If we consider PREs as 
the ChlP-enriched peak of PRCl (intersection of Pc and Ph), 
then Pho sites are estimated to overlap with approximately 
96% of mapped PREs. The importance of Pho binding motifs 
in PRE-mediated silencing has been demonstrated by trans- 
genic analyses using diverse PREs [54,83,84]. Pho-like (Phol) 
has 80% sequence identity with the zinc-finger region of Pho 
and can also bind to the Pho consensus motif in vitro. Further- 
more, double mutants of 'pho and 'phol have a synergistic effect 
leading to Hox gene misexpression, suggesting that Pho and 
Pho-like act redundantly in Hox gene silencing [46]. However, 
genome-wide analysis of Phol binding shows far less overlap 
with Ph sites (21%) compared with the overlap observed 
between Pho and Ph (see above) [74]. Certainly, Pho and 
Phol play a major role (perhaps even the major role) in PcG tar- 
geting, but, because the other complexes can still locate a 
subset of their sites on chromatin in their absence, clearly 
there must be other mechanisms of recruitment to such loci. 
Additionally, Pho is not sufficient for PRE activity, as demon- 
strated by the large percentage of Pho sites that are not bound 
by Pc and Ph. 

The Spl/KLF protein consensus sequence is (G/A)(G/A) 
GG(C/T)G(C/T), and the engrailed {en) PREs contain a per- 
fect match to this consensus [85]. Spps, one of the SPl/ 
KLF family members, not only binds to the engrailed PREs, 
but also shows an identical binding pattern to Psc on poly- 
tene chromosomes. In addition, depletion of Spps leads to 
a loss of pairing-sensitive silencing (PSS; figure 3c), a 
phenomenon that strengthens PRE-mediated repression of 
mini-white reporter transgenes in flies homozygous for the 
transgene [86]. 

GAGA factor (GAP) and Pipsqueak (Psq) bind to a 
GAG AG sequence motif. Both GAP and Psq contain a BTB/ 
POZ domain, which is involved in the formation of homo- or 
heterodimers, and can interact with each other [87]. Genome- 
wide analysis revealed that GAP is colocalized at about half 
of Ph binding sites [74]. GAP binding sites in the even skipped 
{eve) PRE are necessary for PcG-mediated silencing, and GAP 
is required for binding of Pho to PRE chromatin in vitro 
[83,88]. Subsequently, GAP binding sites have also been 
shown to be required in other PREs [81]. GAP is reported to 
remodel chromatin in vitro [89] and recruit chromatin remodel- 
ling factors [90], suggesting a role of GAP at PREs is to mediate 
depletion of nucleosomes to allow binding of other regulators. 

The consensus sequence of Dspl, GAAAA, is found at 
positions close to or overlapping with Pho sites in diverse 
PREs. Removal of Dspl binding from the Fah-7 PRE of 
abdominal-B (Abd-B) and one of the en PREs results in loss 
of PcG-mediated silencing [91]. Genome-wide analysis 
shows that Dspl, like GAP, is present at about half of Ph 
binding regions, despite the Dspl consensus sequence 
not showing a strong correlation with the observed Dspl 
genomic localization [74]. 

The Grh and Zeste consensus sequences are (A/T)C(T/ C/A) 
GGTT and (T/C/G)GAGTG(A/G/C), respectively. Grh binds 
to the iah-7 PRE and also interacts with Pho in vitro and geneti- 
cally [92]. Zeste was initially reported as a component of the 
PRCl complex [32] and is required for maintenance of Ubx 
repression in the embryo [93]. However, in genome-wide analy- 
sis, Zeste is present only at a small percentage of Ph binding sites 
(25%) and Pho binding sites (10%) [74,82]. Additionally, zeste 



mutants have no observed mutant phenotype (homeotic or 
otherwise), suggesting it maybe redundant or non-essential [94]. 

Paradoxically, many of these DNA-binding proteins with 
consensus motifs found in PREs are also implicated in transcrip- 
tional activation, and, furthermore, cognate mutants do not 
show clear PcG mutant phenot3rpes [95]. In fact, an entire 
group of transcriptional activators, the Trithorax group 
(TrxG), are also found to localize to many PREs. The TrxG is 
t3rpicaUy thought of as antagonistic to PcG function, and 
models propose that competition between the TrxG and PcG 
for binding at PREs may play a role in maintenance of transcrip- 
tional state [96]. An algorithm using the consensus motifs 
discussed above to predict PRE sites identified 167 candidates 
[97], but only 32 sites overlapped with Ph binding sites revealed 
by genome-wide analysis [74]. The low predictive power of this 
algorithm shows that there are likely to be many other factors 
and parameters for PRE recognition that are still unknown. 

5. Local chromatin environment 

The modENCODE project has catalogued the distribution of 
selected chromatin marks in the Drosophila genome. The aggre- 
gate data for 18 histone modifications have revealed nine 
major chromatin types based on their combinatorial compo- 
sition. Cross-analysis also included binding of a number of 
non-histone chromatin proteins and DNasel hypersensitivity, 
as well as gene structure and expression to further describe 
the state of the chromatin. Unsurprisingly, transcriptionally 
active chromatin and repressed chromatin were easily identi- 
fiable. Different states of transcription (i.e. transcriptional 
start sites, transcriptional elongation) and different types of 
repressed chromatin (i.e. pericentromeric heterochromatin, 
PcG-repressed domains) each bore their own unique signa- 
tures. The male X chromosome displayed a unique form of 
active chromatin, characterized by the H4K16Ac mark, a hall- 
mark of MSL-mediated dosage compensation (figure 4a) [98]. 

The modEncode data allowed Alekseyenko et ah [3] to 
explore the role of local chromatin environment in the 
choice of MRE sequences to be used as CESs. They compared 
the chromatin environment of 150 bound CESs with several 
alternative groups of unused MREs. The results showed 
that a clear signature for functional MREs consists of an 
enrichment of the H3K36me3 mark, Jil-1 (a kinase enriched 
on X with the MSL complex) and, unsurprisingly, the 
H4K16Ac mark. Notably, the enrichment of these three fac- 
tors is present in both male and female tissue culture cells, 
suggesting that they are more than merely a post-MSL bind- 
ing consequence. Furthermore, as mentioned above, a 
characteristic sequence bias exists around functional MREs. 
The surrounding 1 kb (centred on the MRE) showed an AT 
enrichment, whereas the surrounding 10 kb (excluding the 
central 1 kb) was characteristically GC rich. Building a predic- 
tive model using chromatin marks and the AT richness as 
factors met with reasonable success, suggesting that chroma- 
tin environment is likely to play a role in initial targeting of 
the MSL complex. However, even given enrichment for all 
of these the factors, it remains impossible to discriminate 
with certainty between a functional MRE and one that is 
not used. The ultimate puzzle to be solved is why autosomal 
sites meeting all parameters still do not autonomously attract 
the MSL complex, whereas similar-looking sequences moved 
from X to A do function to attract the complex. 
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Figure 4. Chromatin environment and long-range chromatin interactions, (a) Analysis of 18 histone modifications (left) by the modEncode project led to the 
identification of signatures for nine different chromatin types. Type 5 (green) bore the signature of dosage compensated chromatin (note the enrichment of 
the MOF-catalysed H4K16Ac mark), whereas type 6 (grey) was identified as PcG-associated chromatin (note the enrichment of the E(z) catalysed H3K27me3 
mark). Cross-validation with other chromatin proteins and with genomic features (right) confirms many of the predicted enrichments/depletions. From Kharchenko 
et al. [98]. (b) Schematic for results of Hi-C experiments that capture long-range interactions between PcG-bound regions. The highest-frequency contacts are 
observed along the diagonal owing to the fact that interactions are strongest over short linear distances. However, high contact frequencies are observed between 
the Bithorax and Antennapedia clusters, which are regulated by the PcG. Adapted from Sexton et al. [99]. 



PREs have different combinations of diverse motifs 
without any preferential number, order or spacing [54]. Fur- 
thermore, the number and order of motifs show significant 
differences among Drosophila species, even within orthologous 
PREs [54]. It is unclear, however, whether the diverse PRE 
topologies have different functional effects or whether they 
have similar functional activities independent from their 
sequence organization. To investigate this question, selected 
PREs have been analysed using transgenes that contain a 
PRE and reporter gene, such as mini-white or lacZ; these trans- 
genic analyses have revealed that the activities of PREs are 
highly influenced by genomic context. For example, trans- 
genes containing en or invected (inv) PREs show PSS at only 
about 60% and 20-45% of insertion sites, respectively [100]. 
This phenomenon may be due to the effects of neighbouring 
regulatory elements on PRE activity. In screening dominant 
suppressors of PSS, Noyes et al. [101] observed that gain of 
function mutations in the transcriptional activator Woe can 
block en PRE activity. This suggests that, in the context of 
this reporter assay, there is competition between the PRE 
and neighbouring regulatory elements, such as enhancers, 
for control over the transcriptional state of the reporter gene. 

Insulators have been identified as another t3rpe of regulatory 
element that affects PRE activity, potentially by blocking the 
spreading of PcG proteins or H3K27me3 marks [102-104]. 
Genome-wide analysis shows that diverse insulator proteins 
such as Su(Hw), CP190 and dCTCF are broadly distributed 
throughout the genome [105,106]. Interestingly, PREs at many 
sites, including the Hox gene region, are flanked by insulator 
elements, suggesting that the flanking insulators protect neigh- 
bouring genes from inappropriate silencing by PREs as well as 
inappropriate activation by enhancers. The removal of insulator 
binding sites or the depletion of insulator proteins can result in 
lower H3K27me3 within these domains, but appears to have 
little effect on spreading beyond the borders, which might be 



expected if insulators were the sole causative agent [107]. Alter- 
natively, a recent study using a Fab-7 PRE transgene showed 
that spreading of H3K27me3 is blocked by RNA pol3nTierase 
II bound promoter regions and active chromatin marks rather 
than by insulator elements [40,108,109]. It is likely that multiple 
mechanisms contribute to the delineation of a PcG domain. 

The sensitivity of PREs to genomic context makes com- 
parison of functional differences between PREs challenging. 
Therefore, site-specific recombination tools such as gene con- 
version and OC31 integration have been very important to 
examine the effects of mutation or deletion of sites within a 
PRE, and for direct comparison between different PREs in 
a constant genomic context [84,110]. Using the gene conver- 
sion technique, Kozma et al. [84] showed that binding motifs 
of Pho and GAF in the bithoraxoid (bxd) PRE are cooperatively 
required for silencing by the PRE, whereas the Dspl binding 
motif is not essential for the PRE activity. In addition, replace- 
ment of the bxd PRE in a reporter construct with the iab-7 or iab- 
5 PREs from Abd-B revealed that these PREs are interchange- 
able, indicating equivalent functional capabilities. On the 
other hand, direct comparison between the Fab-7 PRE and 
one of the vestigial (vg) PREs at four OC31 site-specific inte- 
gration loci demonstrated that these two PREs exhibit 
differential silencing traits in a genomic-context-dependent 
manner, indicating that different PREs can have distinct prop- 
erties [110]. Furthermore, a recent functional analysis of two 
different en PREs supports the idea that different PREs produce 
different functional outcomes. PREl and PRE2 of the engrailed 
gene not only have a different number, order and spacing of 
motifs, but also require different numbers of Pho motifs for 
silencing activity. In addition, an AT-rich region only found 
in PREl is required for full PSS activity. These differences led 
to distinct expression patterns in embryonic and larval reporter 
assays in which the reporter construct was integrated at the 
same genomic site [54]. 
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Figure 5. Updated models, [a) The core MSL complex is targeted to MREs in 
CESs in part by the DNA-binding activity of CLAMP, which provides a molecular 
link between the complex and DNA in a sequence-specific manner. Selection of 
CESs also strongly favours an active chromatin context, with pre-existing enrich- 
ment of the histone marks H3K36me3 and H4K16Ac and the chromosomal 
kinase Jil-1 that can be observed even in female cells (thus, not due to the 
presence of the MSL complex itself), (b) Insulators contribute to the long- 
range interactions between PREs. In this way, PcG complexes may be brought 
to distal sites of activity. As yet unresolved, however, is the identity of factors 
that are critical for initial targeting of the PcG complexes. Certainly, Pho plays a 
major role, but other, unidentified factors also clearly take part. 

6. Higher-order chromatin organization 

An additional avenue that has yet to be fully explored with 
respect to MSL complex targeting is the three-dimensional struc- 
ture and folding of the genome [111]. This piece of the puzzle 
regarding the larger chromatin environment has been the subject 
of intense investigation by the chromatin field recently [99,112], 
but without specific attention to dosage compensation. The X 
chromosome, like all chromosomes, clearly forms its own terri- 
tory within the nucleus. However, small 150 bp CES segments 
inserted on autosomes can stiU attract the MSL complex to 
their ectopic sites, so any model invoking global chromosomal 
structure will need to incorporate the apparent ability of these 
segments to act autonomously wherever inserted. 

Much more intensive investigation of higher-order struc- 
ture has occurred in the PcG field. Through PRE transgenic 
analyses and techniques such as DNA adenine methyltransfer- 
ase identification (DamID), chromatin conformation capture 
assays (3C and associated technologies), and fluorescent 
in situ hybridization, it has been reported that PcG-targeted 
regions can contact other PcG-bound regions or promoters 
over long distances [113-116]. On the basis of these obser- 
vations, it has been proposed that the long-distance contacts 
are mediated by interaction between PcG proteins bound at 
distant PREs, creating large loop structures that contribute to 
higher-order chromatin structure. Such interactions clearly 
exist between the BX-C and ANT-C, as observed by recent 
Hi-C chromatin capture experiments (figure ^h) [99]. In further 
support of this idea, PcG proteins are visualized by immuno- 
fluorescence as nuclear speckles termed Tolycomb bodies', of 
which there are significantly fewer than the number of PcG 
binding sites found by genome-wide analysis. Therefore, Poly- 
comb bodies are thought to be foci formed by long-range 
interactions among PcG proteins bound to several PREs, 



even if large PcG binding chromatin domains spanning 
more than 100 kb (such as Hox gene cluster regions) may by 
themselves be visualized as Polycomb bodies. 

Recently, however, the idea that PcG proteins themselves 
mediate long-range interaction has been challenged. Using 
transgenes containing the Mcp and Fah elements, Li et al. [117] 
showed that the interaction between Mcp and Fah-7 elements 
does not depend on the ability to recruit PcG complexes, but 
rather on insulator DNA sequences flanking or within the 
PRE element. In subsequent investigations by the same group, 
CTCF binding sites were observed to be critical for long-range 
interactions of reporter constructs; in comparison, the PRE 
was observed to be needed only for proper and stable sorting 
to subnuclear structures [118]. Additionally, this work suggests 
that interactions between insulators, enhancers and PREs may 
be quite complex; perhaps so much so that such interactions 
may, in reality, be difficult to model with ectopic transgenic 
methodology and individual transgenes (figure 5). 

7. Involvement of RNAs 

In review of the topics covered so far — the biochemical prop- 
erties of the proteins and complexes themselves, the sequence 
motifs associated with their binding, and the state of the chro- 
matin around targets, both locally and at longer range — it 
seems clear that each of these factors plays at least an incre- 
mental role in determining the initial targeting of chromatin 
complexes. In this section, we consider evidence for the role 
of RNAs in the initial targeting of these model complexes. 

The long non-coding roX RNAs are an integral part of the 
MSL complex [119]. They are critical for the establishment of 
the full pattern of MSL binding along the male X chromo- 
some [120]. However, in the absence of both transcripts, 
there is still limited assembly and targeting of the complex. 
In a recent advance, biochemical analysis provided evidence 
that ATP-dependent remodelling of the stem-loop structures 
of the roX RNAs by MLE is a critical step in complex assem- 
bly [58]. Interestingly, mutation of the structure of these loops 
can lead to mislocalization of the MSL complex [121,122]. 
However, in all the cases of roX mutation, residual targeting 
of the X remains intact, suggesting that the RNA may func- 
tion primarily in the spreading rather than the initial 
targeting step. Long non-coding (Inc) RNAs have not been 
reported in PcG targeting in Drosophila, in contrast to reports 
in mammalian systems, in which numerous IncRNAs have 
been proposed to function in PcG targeting [123]. 

A recent report proposes a role for the small interfering 
RNA (siRNA) pathway in targeting of the MSL complex to 
the X chromosome. Menon & Meller [124] observe a synthetic 
lethal phenotype in male flies between roX RNA double 
mutants and mutants for the siRNA pathway such as Dicer-2 
(Dcr-2), Argonaute 2 (Ago2), and Elongator complex protein 1 
(D-elpl). The synthetic lethal phenotype is likely to be due to 
an accompan3dng defect in X-chromosome-specific MSL 
recruitment. The aberrant targeting in these cases ranged from 
lack of X recruitment, to relocalization to the chromocentre, to 
appearance at autosomal or telomeric sites. The involvement 
of ncRNAs other than the roX RNAs in dosage compensation 
would be a potentially rich source of additional targeting fac- 
tors. Menon and Meller note, however, that it is possible that 
the observed phenotypes may not come from a direct inter- 
action between the siRNA machinery and the MSL complex at 



dosage compensated loci, but rather from a more general role of 
the siRNA machinery in chromatin architecture. 

Similarly, the direct involvement of RNAi machinery com- 
ponents in PcG-mediated silencing is being debated in the 
PcG field. The positive correlation of Ago2 ChIP data with 
PcG ChlP-chip data from modENCODE led Taliaferro and 
co-workers to suggest that Ago2-mediated transcriptional 
repression, which is independent of the catalytic activity of 
Ago2, might be related to PcG-mediated silencing [125]. 
Previously, Grimaud and co-workers [126] found that 
mutations in RNAi components such as Agol, Piwi, Aubergine 
(Aub) and Dcr2 resulted in the derepression of PcG-mediated 
silencing in PSS assays. However, the appearance of Fah-7 
region directed 21-23 nt small RNAs in transgenic fHes was 
due solely to the Fah-7 element in the transgene; the endogenous 
locus itself does not stimulate the production of such siRNAs. 

Therefore, the role of RNAi components in PcG-mediated 
silencing still seems uncertain, and potentially tied to the 
structure of the specific transgenes used in transgenic 
assays. A recent analysis of the possible role of RNAi com- 
ponents in PcG-mediated silencing at the endogenous BX-C 
locus showed that depletion of RNAi components did not 
result in any significant change in the repressed chromatin 
state of the locus, and direct comparison of Ago2-bound 
small RNAs and PcG-target promoter regions showed no 
enrichment of small RNAs for such PcG targets [127]. In 
addition, Moshkovich et al. showed that the binding profile 
of Ago2 overlaps with TrxG proteins as well as PcG proteins, 
and Ago2 not only preferentially associates with active pro- 
moters, but also acted as a TrxG protein opposing the Pc 
mutant phenotype in a genetic screen [128]. 

In one additional potential link to small RNAs, analysis of 
the chromatin environment at transcription start sites proxi- 
mal to PREs showed that these regions are enriched with 
H3K4mel and H3K4me2 marks, and many of them generate 
short RNAs owing to stalled PolII [98], consistent with the 
possibility that 5' paused RNAs might play a functional 
role in PcG targeting [129]. 



8. Summary 



As can be seen, the constellation of factors and sequence 
characteristics that may be involved in initial targeting of 
complexes to chromatin makes investigation of this topic a 
complicated endeavour. In the case of MSL targeting, pri- 
mary sequence characteristics of CESs are certainly a major 
factor, as novel sex chromosomes generated by chromosomal 



fusions in Drosophila miranda become dosage-compensated by 
the acquisition of CES-like sequences [130]. The recent charac- 
terization of CLAMP has provided a strong molecular link 
between the MSL complex and its recruitment to MREs in 
CESs. Whether the D. miranda version of CLAMP can simi- 
larly explain the selection of CES-like sequences will be an 
important test. Interestingly, although the mechanisms of 
dosage compensation are not conserved in nematodes and 
in mammals, two other model systems that have been inten- 
sively studied, the 'nucleate and spread' models for targeting 
to the X chromosome, appear quite similar in the three 
systems [51,131,132]. 

PcG targeting seems to involve multiple proteins, and no 
single consensus sequence (even the Pho/Phol consensus) 
can adequately predict PcG binding. Furthermore, the 
manner by which PREs are typically studied is with trans- 
genic reporter assays in which recruitment of PcG is not 
directly assayed; rather, functional silencing of the transgene 
is tested. However, PcG recruitment does not always lead to 
silencing, but may instead only modulate transcription levels 
(e.g. at the Psc/Su(z)2 locus) [104,133-135]. PcG targeting 
and silencing function, then, may not always be linked. The 
molecular details of this separation are unclear; however, 
post-translational modifications of PcG subunits, in addition 
to the diverse biochemical activities of the multiple com- 
plexes, suggest that there could be factors that influence 
one but not the other [49,136,137]. 

With the advent of higher-throughput genome editing 
technologies, such as the CRISPR/ Cas9 system, it is becoming 
more feasible to do the types of experiments that may reveal 
requirements for targeting and function without removing 
sites from their endogenous locations. It may finally be pos- 
sible to expand from single site assays to look at more global 
recruitment: mutation of a majority of MREs or PREs in a 
chromosomal region. New chromatin immunoprecipitation 
techniques that enhance the resolution of binding sites may 
also help to refine known binding patterns or reveal sequence 
motifs. Hopefully, these technologies will aUow investigators 
to address the many outstanding questions that still exist in 
the area of PcG and MSL initial targeting. 
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